1,448 research outputs found
MPCI : An R Package for Computing Multivariate Process Capability Indices
Manufacturing processes are often based on more than one quality characteristic. When these variables are correlated the process capability analysis should be performed using multivariate statistical methodologies. Although there is a growing interest in methods for evaluating the capability of multivariate processes, little attention has been given to developing user friendly software for supporting multivariate capability analysis. In this work we introduce the package MPCI for R, which allows to compute multivariateprocess capability indices. MPCI aims to provide a useful tool for dealing with multivariate capability assessment problems. We illustrate the use of MPCI package through both simulated and real examples
Graph Neural Network-Based Anomaly Detection for River Network Systems
Water is the lifeblood of river networks, and its quality plays a crucial
role in sustaining both aquatic ecosystems and human societies. Real-time
monitoring of water quality is increasingly reliant on in-situ sensor
technology. Anomaly detection is crucial for identifying erroneous patterns in
sensor data, but can be a challenging task due to the complexity and
variability of the data, even under normal conditions. This paper presents a
solution to the challenging task of anomaly detection for river network sensor
data, which is essential for accurate and continuous monitoring. We use a graph
neural network model, the recently proposed Graph Deviation Network (GDN),
which employs graph attention-based forecasting to capture the complex
spatio-temporal relationships between sensors. We propose an alternate anomaly
scoring method, GDN+, based on the learned graph. To evaluate the model's
efficacy, we introduce new benchmarking simulation experiments with
highly-sophisticated dependency structures and subsequence anomalies of various
types. We further examine the strengths and weaknesses of this baseline
approach, GDN, in comparison to other benchmarking methods on complex
real-world river network data. Findings suggest that GDN+ outperforms the
baseline approach in high-dimensional data, while also providing improved
interpretability. We also introduce software called gnnad
Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective
Many research domains use data elicited from "citizen scientists" when a
direct measure of a process is expensive or infeasible. However, participants
may report incorrect estimates or classifications due to their lack of skill.
We demonstrate how Bayesian hierarchical models can be used to learn about
latent variables of interest, while accounting for the participants' abilities.
The model is described in the context of an ecological application that
involves crowdsourced classifications of georeferenced coral-reef images from
the Great Barrier Reef, Australia. The latent variable of interest is the
proportion of coral cover, which is a common indicator of coral reef health.
The participants' abilities are expressed in terms of sensitivity and
specificity of a correctly classified set of points on the images. The model
also incorporates a spatial component, which allows prediction of the latent
variable in locations that have not been surveyed. We show that the model
outperforms traditional weighted-regression approaches used to account for
uncertainty in citizen science data. Our approach produces more accurate
regression coefficients and provides a better characterization of the latent
process of interest. This new method is implemented in the probabilistic
programming language Stan and can be applied to a wide number of problems that
rely on uncertain citizen science data.Comment: 18 figures, 5 table
Bayesian spatio-temporal models for stream networks
Spatio-temporal models are widely used in many research areas including
ecology. The recent proliferation of the use of in-situ sensors in streams and
rivers supports space-time water quality modelling and monitoring in near
real-time. In this paper, we introduce a new family of dynamic spatio-temporal
models, in which spatial dependence is established based on stream distance and
temporal autocorrelation is incorporated using vector autoregression
approaches. We propose several variations of these novel models using a
Bayesian framework. Our results show that our proposed models perform well
using spatio-temporal data collected from real stream networks, particularly in
terms of out-of-sample RMSPE. This is illustrated considering a case study of
water temperature data in the northwestern United States.Comment: 26 pages, 10 fig
Increasing trust in new data sources: crowdsourcing image classification for ecology
Crowdsourcing methods facilitate the production of scientific information by
non-experts. This form of citizen science (CS) is becoming a key source of
complementary data in many fields to inform data-driven decisions and study
challenging problems. However, concerns about the validity of these data often
constrain their utility. In this paper, we focus on the use of citizen science
data in addressing complex challenges in environmental conservation. We
consider this issue from three perspectives. First, we present a literature
scan of papers that have employed Bayesian models with citizen science in
ecology. Second, we compare several popular majority vote algorithms and
introduce a Bayesian item response model that estimates and accounts for
participants' abilities after adjusting for the difficulty of the images they
have classified. The model also enables participants to be clustered into
groups based on ability. Third, we apply the model in a case study involving
the classification of corals from underwater images from the Great Barrier
Reef, Australia. We show that the model achieved superior results in general
and, for difficult tasks, a weighted consensus method that uses only groups of
experts and experienced participants produced better performance measures.
Moreover, we found that participants learn as they have more classification
opportunities, which substantially increases their abilities over time.
Overall, the paper demonstrates the feasibility of CS for answering complex and
challenging ecological questions when these data are appropriately analysed.
This serves as motivation for future work to increase the efficacy and
trustworthiness of this emerging source of data.Comment: 25 pages, 10 figure
clusterBMA: Bayesian model averaging for clustering
Various methods have been developed to combine inference across multiple sets
of results for unsupervised clustering, within the ensemble clustering
literature. The approach of reporting results from one `best' model out of
several candidate clustering models generally ignores the uncertainty that
arises from model selection, and results in inferences that are sensitive to
the particular model and parameters chosen. Bayesian model averaging (BMA) is a
popular approach for combining results across multiple models that offers some
attractive benefits in this setting, including probabilistic interpretation of
the combined cluster structure and quantification of model-based uncertainty.
In this work we introduce clusterBMA, a method that enables weighted model
averaging across results from multiple unsupervised clustering algorithms. We
use clustering internal validation criteria to develop an approximation of the
posterior model probability, used for weighting the results from each model.
From a consensus matrix representing a weighted average of the clustering
solutions across models, we apply symmetric simplex matrix factorisation to
calculate final probabilistic cluster allocations. In addition to outperforming
other ensemble clustering methods on simulated data, clusterBMA offers unique
features including probabilistic allocation to averaged clusters, combining
allocation probabilities from 'hard' and 'soft' clustering algorithms, and
measuring model-based uncertainty in averaged cluster allocation. This method
is implemented in an accompanying R package of the same name
Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences
Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73ā83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.Canadian Institutes of Health Research (MOP#82940)Sickkids FoundationOntario Research FundNational Science Foundation (U.S.)National Human Genome Research Institute (U.S.) (R01 HG003985
The multifaceted roles of perlecan in fibrosis
Perlecan, or heparan sulfate proteoglycan 2 (HSPG2), is a ubiquitous heparan sulfate proteoglycan that has major roles in tissue and organ development and wound healing by orchestrating the binding and signaling of mitogens and morphogens to cells in a temporal and dynamic fashion. In this review, its roles in fibrosis are reviewed by drawing upon evidence from tissue and organ systems that undergo fibrosis as a result of an uncontrolled response to either inflammation or traumatic cellular injury leading to an over production of a collagen-rich extracellular matrix. This review focuses on examples of fibrosis that occurs in lung, liver, kidney, skin, kidney, neural tissues and blood vessels and its link to the expression of perlecan in that particular organ system
- ā¦